English-Chinese Bi-Directional OOV Translation based on Web Mining and Supervised Learning

نویسندگان

  • Yuejie Zhang
  • Yang Wang
  • Xiangyang Xue
چکیده

In Cross-Language Information Retrieval (CLIR), Out-of-Vocabulary (OOV) detection and translation pair relevance evaluation still remain as key problems. In this paper, an English-Chinese Bi-Directional OOV translation model is presented, which utilizes Web mining as the corpus source to collect translation pairs and combines supervised learning to evaluate their association degree. The experimental results show that the proposed model can successfully filter the most possible translation candidate with the lower computational cost, and improve the OOV translation ranking effect, especially for popular new words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection

Cross-Language Information Retrieval (CLIR) system uses dictionaries for information retrieval. However, out of vocabulary (OOV) terms cannot be found in dictionaries. Although many researchers in the past have endeavored to solve the OOV term translation problem, but little attention has been paid to hybrid translations “α1antitrypsin deficiency (α1-抗胰蛋白酶缺乏症)”. This paper presents a novel OOV ...

متن کامل

Improved Cross-language Information Retrieval via Disambiguation and Vocabulary Discovery

Cross-lingual information retrieval (CLIR) allows people to find documents irrespective of the language used in the query or document. This thesis is concerned with the development of techniques to improve the effectiveness of Chinese–English CLIR. In Chinese–English CLIR, the accuracy of dictionary-based query translation is limited by two major factors: translation ambiguity and the presence ...

متن کامل

Semi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages

This paper presents a semi-supervised learning framework for mining Chinese-English lexicons from large amount of Chinese Web pages. The issue is motivated by the observation that many Chinese neologisms are accompanied by their English translations in the form of parenthesis. We classify parenthetical translations into bilingual abbreviations, transliterations, and translations. A frequency-ba...

متن کامل

A Chinese-English Organization Name Translation System Using Heuristic Web Mining and Asymmetric Alignment

In this paper, we propose a novel system for translating organization names from Chinese to English with the assistance of web resources. Firstly, we adopt a chunkingbased segmentation method to improve the segmentation of Chinese organization names which is plagued by the OOV problem. Then a heuristic query construction method is employed to construct an efficient query which can be used to se...

متن کامل

Semi-supervised Chinese Word Segmentation based on Bilingual Information

This paper presents a bilingual semisupervised Chinese word segmentation (CWS) method that leverages the natural segmenting information of English sentences. The proposed method involves learning three levels of features, namely, character-level, phrase-level and sentence-level, provided by multiple submodels. We use a sub-model of conditional random fields (CRF) to learn monolingual grammars, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009